# Complete Setup Guide for Chatbot Excel Processor

## Prerequisites

### 1. Install Required Python Packages

In your Jupyter notebook, run:

```python
!pip install pandas openpyxl requests selenium webdriver-manager
```

### 2. For Selenium (Web Interface) Method

If you need to interact with the web interface directly:

```python
!pip install selenium webdriver-manager
```

Then install ChromeDriver:

```python
from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# This will automatically download and install ChromeDriver
service = Service(ChromeDriverManager().install())
```

## Quick Start

### Step 1: Create Your Excel File

Your Excel file should have a column named "Questions" with your questions:

| Questions |
|-----------|
| What is AI? |
| How does machine learning work? |
| Explain neural networks |

### Step 2: Choose Your Method

**Method 1: API Approach (Recommended)**
- Use if your chatbot has an API endpoint
- Faster and more reliable
- Use the first script (ChatbotProcessor)

**Method 2: Web Interface Approach**
- Use if you need to interact with the web UI directly
- Slower but works with any web interface
- Use the second script (SeleniumChatbotProcessor)

### Step 3: Run the Code

#### For API Method:
```python
# Test connection first
test_chatbot_connection("http://localhost:8501")

# Process your questions
result_df = process_excel_questions(
    input_file="your_questions.xlsx",
    output_file="responses.xlsx",
    chatbot_url="http://localhost:8501"
)
```

#### For Selenium Method:
```python
result_df = process_excel_with_selenium(
    input_file="your_questions.xlsx",
    output_file="responses.xlsx",
    chatbot_url="http://localhost:8501",
    headless=False  # Set to True to hide browser
)
```

## Customization Guide

### 1. Modify API Endpoints

If your chatbot uses different API endpoints, update the `api_endpoints` list in the `send_question_api` method:

```python
api_endpoints = [
    f"{self.base_url}/your-custom-endpoint",
    f"{self.base_url}/api/your-endpoint"
]
```

### 2. Modify Request Payload

Update the `payloads` list to match your API's expected format:

```python
payloads = [
    {"your_field": question},
    {"custom_key": question}
]
```

### 3. Modify Response Parsing

Update the response parsing logic to match your API's response format:

```python
# In send_question_api method
return (data.get('your_response_field') or 
        data.get('your_answer_field') or 
        str(data))
```

### 4. For Selenium Method - Update Selectors

Update the CSS selectors to match your chatbot's HTML structure:

```python
# In find_input_element method
input_selectors = [
    "your-custom-input-selector",
    "input[id='your-input-id']"
]

# In find_submit_button method
button_selectors = [
    "button[id='your-button-id']",
    "//button[contains(text(), 'Your Button Text')]"
]
```

## Troubleshooting

### Common Issues:

1. **"Cannot connect to chatbot"**
   - Ensure your chatbot is running at localhost:8501
   - Check if the URL is correct
   - Try accessing the URL in your browser first

2. **"Could not find input element"** (Selenium method)
   - Inspect your chatbot's HTML to find the correct selectors
   - Update the `input_selectors` list with the correct CSS selectors

3. **"API endpoint not found"**
   - Check if your chatbot has an API endpoint
   - Use browser developer tools to see what requests are made
   - Update the API endpoints in the code

4. **"ChromeDriver not found"** (Selenium method)
   - Install ChromeDriver using webdriver-manager (shown above)
   - Or manually download and add to PATH

### Debugging Tips:

1. **Test with a single question first:**
```python
# Create a test file with one question
test_data = {"Questions": ["What is AI?"]}
test_df = pd.DataFrame(test_data)
test_df.to_excel("test.xlsx", index=False)

# Process the test file
result = process_excel_questions("test.xlsx", "test_output.xlsx")
```

2. **Use browser developer tools:**
   - Open your chatbot in browser
   - Press F12 to open developer tools
   - Watch the Network tab when you send a message
   - Note the request URL and payload format

3. **Enable detailed logging:**
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## Example Complete Notebook Code

```python
# Cell 1: Install packages
!pip install pandas openpyxl requests selenium webdriver-manager

# Cell 2: Import and setup
import pandas as pd
import requests
# ... (copy the main script here)

# Cell 3: Create test data
sample_data = {
    "Questions": [
        "What is artificial intelligence?",
        "How does machine learning work?",
        "What are the benefits of using AI in business?"
    ]
}
df = pd.DataFrame(sample_data)
df.to_excel("my_questions.xlsx", index=False)
print("Created my_questions.xlsx")

# Cell 4: Test connection
test_chatbot_connection("http://localhost:8501")

# Cell 5: Process questions
result_df = process_excel_questions(
    input_file="my_questions.xlsx",
    output_file="responses.xlsx",
    chatbot_url="http://localhost:8501",
    delay_seconds=2.0
)

# Cell 6: View results
print(result_df.to_string())
```

## Performance Tips

1. **Adjust delay between requests:**
   - Increase `delay_seconds` if your chatbot is slow
   - Decrease for faster processing (but risk overwhelming the server)

2. **Use headless mode for Selenium:**
   - Set `headless=True` for faster processing
   - Set `headless=False` for debugging

3. **Process in batches:**
   - For large files, process in smaller batches
   - Save progress periodically

4. **Resume from where you left off:**
   - The scripts skip rows that already have responses
   - Safe to rerun if interrupted

## Security Notes

- Only use this with chatbots you trust
- Be careful with sensitive data in Excel files
- Consider using environment variables for URLs and credentials
- Test with non-sensitive data first

## Next Steps

1. Adapt the code to your specific chatbot's interface
2. Test with a small sample first
3. Run on your full dataset
4. Set up error handling and logging as needed

In [None]:
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import TimeoutException, NoSuchElementException
import logging

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class SeleniumChatbotProcessor:
    def __init__(self, chatbot_url: str = "http://localhost:8501", headless: bool = False):
        self.chatbot_url = chatbot_url
        self.driver = None
        self.headless = headless
        self.setup_driver()
    
    def setup_driver(self):
        """Initialize the Chrome WebDriver"""
        chrome_options = Options()
        if self.headless:
            chrome_options.add_argument("--headless")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--disable-gpu")
        
        try:
            self.driver = webdriver.Chrome(options=chrome_options)
            self.driver.get(self.chatbot_url)
            time.sleep(3)  # Wait for page to load
            logger.info("WebDriver initialized successfully")
        except Exception as e:
            logger.error(f"Failed to initialize WebDriver: {e}")
            raise
    
    def find_input_element(self):
        """Find the input element on the page using various strategies"""
        input_selectors = [
            # Common Streamlit text input selectors
            "input[type='text']",
            "textarea",
            "input[data-testid='textInput-value']",
            "textarea[data-testid='textArea-value']",
            "input[class*='st-']",
            "textarea[class*='st-']",
            # Generic selectors
            "input[placeholder*='question']",
            "input[placeholder*='message']",
            "textarea[placeholder*='question']",
            "textarea[placeholder*='message']",
            # ID-based selectors
            "#message",
            "#question",
            "#input",
            "#chat-input"
        ]
        
        for selector in input_selectors:
            try:
                element = self.driver.find_element(By.CSS_SELECTOR, selector)
                if element.is_displayed() and element.is_enabled():
                    logger.info(f"Found input element with selector: {selector}")
                    return element
            except NoSuchElementException:
                continue
        
        return None
    
    def find_submit_button(self):
        """Find the submit button using various strategies"""
        button_selectors = [
            # Common button selectors
            "button[type='submit']",
            "input[type='submit']",
            "button[data-testid='baseButton-primary']",
            "button[class*='st-']",
            # Text-based selectors
            "//button[contains(text(), 'Send')]",
            "//button[contains(text(), 'Submit')]",
            "//button[contains(text(), 'Ask')]",
            "//button[contains(text(), 'Chat')]",
            "//input[@value='Send']",
            "//input[@value='Submit']"
        ]
        
        for selector in button_selectors:
            try:
                if selector.startswith("//"):
                    element = self.driver.find_element(By.XPATH, selector)
                else:
                    element = self.driver.find_element(By.CSS_SELECTOR, selector)
                
                if element.is_displayed() and element.is_enabled():
                    logger.info(f"Found submit button with selector: {selector}")
                    return element
            except NoSuchElementException:
                continue
        
        return None
    
    def send_question(self, question: str, timeout: int = 30) -> str:
        """Send a question to the chatbot and get the response"""
        try:
            # Find input element
            input_element = self.find_input_element()
            if not input_element:
                return "Error: Could not find input element on the page"
            
            # Clear and enter question
            input_element.clear()
            input_element.send_keys(question)
            time.sleep(1)
            
            # Find and click submit button
            submit_button = self.find_submit_button()
            if submit_button:
                submit_button.click()
            else:
                # Try pressing Enter if no button found
                from selenium.webdriver.common.keys import Keys
                input_element.send_keys(Keys.RETURN)
            
            # Wait for response
            time.sleep(3)
            
            # Try to find response element
            response = self.get_response()
            return response
            
        except Exception as e:
            logger.error(f"Error sending question: {e}")
            return f"Error: {str(e)}"
    
    def get_response(self) -> str:
        """Extract the response from the page"""
        response_selectors = [
            # Common response area selectors
            "div[data-testid='chatMessage']",
            "div[class*='chat-message']",
            "div[class*='response']",
            "div[class*='st-chat-message']",
            "p[class*='st-']",
            "div[class*='st-'] p",
            "div.element-container p",
            "div.element-container div",
            # Last message selectors
            "div[data-testid='stChatMessage']:last-child",
            "div[data-testid='stMarkdownContainer']:last-child"
        ]
        
        for selector in response_selectors:
            try:
                elements = self.driver.find_elements(By.CSS_SELECTOR, selector)
                if elements:
                    # Get the last element (most recent response)
                    last_element = elements[-1]
                    text = last_element.text.strip()
                    if text and text != question:  # Make sure it's not just echoing the question
                        logger.info(f"Found response with selector: {selector}")
                        return text
            except Exception:
                continue
        
        # Fallback: try to get any text that appeared after sending the question
        try:
            # Get all text from the body and try to extract new content
            body_text = self.driver.find_element(By.TAG_NAME, "body").text
            lines = body_text.split('\n')
            # Return the last few non-empty lines as response
            non_empty_lines = [line.strip() for line in lines if line.strip()]
            if non_empty_lines:
                return non_empty_lines[-1]
        except Exception:
            pass
        
        return "Error: Could not extract response from page"
    
    def close(self):
        """Close the WebDriver"""
        if self.driver:
            self.driver.quit()
            logger.info("WebDriver closed")

def process_excel_with_selenium(
    input_file: str,
    output_file: str = None,
    question_column: str = "Questions",
    response_column: str = "Response",
    chatbot_url: str = "http://localhost:8501",
    delay_seconds: float = 2.0,
    headless: bool = False
):
    """
    Process questions from Excel using Selenium to interact with web interface
    """
    
    processor = None
    
    try:
        # Read Excel file
        logger.info(f"Reading Excel file: {input_file}")
        df = pd.read_excel(input_file)
        
        # Check if question column exists
        if question_column not in df.columns:
            raise ValueError(f"Column '{question_column}' not found in Excel file")
        
        # Initialize response column if it doesn't exist
        if response_column not in df.columns:
            df[response_column] = ""
        
        # Initialize Selenium processor
        logger.info("Initializing web browser...")
        processor = SeleniumChatbotProcessor(chatbot_url, headless=headless)
        
        # Process each question
        total_questions = len(df)
        logger.info(f"Processing {total_questions} questions...")
        
        for index, row in df.iterrows():
            question = row[question_column]
            
            # Skip if question is empty or response already exists
            if pd.isna(question) or question.strip() == "":
                logger.info(f"Skipping empty question at row {index + 1}")
                continue
            
            if not pd.isna(row[response_column]) and str(row[response_column]).strip() != "":
                logger.info(f"Skipping row {index + 1} - response already exists")
                continue
            
            logger.info(f"Processing question {index + 1}/{total_questions}: {question[:50]}...")
            
            # Send question and get response
            response = processor.send_question(question)
            
            # Store response
            df.at[index, response_column] = response
            logger.info(f"Got response: {response[:100]}...")
            
            # Add delay between requests
            time.sleep(delay_seconds)
        
        # Save results
        output_path = output_file or input_file
        logger.info(f"Saving results to: {output_path}")
        df.to_excel(output_path, index=False)
        
        logger.info("Processing completed successfully!")
        return df
        
    except Exception as e:
        logger.error(f"Error processing Excel file: {e}")
        raise
    
    finally:
        if processor:
            processor.close()

# Installation and setup instructions
def print_setup_instructions():
    """Print setup instructions for required packages"""
    print("="*60)
    print("SETUP INSTRUCTIONS")
    print("="*60)
    print()
    print("1. Install required packages:")
    print("   pip install pandas openpyxl selenium webdriver-manager")
    print()
    print("2. Install Chrome WebDriver:")
    print("   - Download from: https://chromedriver.chromium.org/")
    print("   - Or use: pip install webdriver-manager")
    print("   - Make sure ChromeDriver is in your PATH")
    print()
    print("3. Make sure your chatbot is running at localhost:8501")
    print()
    print("4. Update the selectors in the code if needed for your specific UI")
    print("="*60)

# Example usage
if __name__ == "__main__":
    print_setup_instructions()
    
    # Create sample Excel file
    sample_data = {
        "Questions": [
            "What is artificial intelligence?",
            "How does machine learning work?",
            "What are neural networks?"
        ]
    }
    
    df = pd.DataFrame(sample_data)
    df.to_excel("sample_questions.xlsx", index=False)
    print("Created sample_questions.xlsx")
    
    # Process questions (uncomment to run)
    # result_df = process_excel_with_selenium(
    #     input_file="sample_questions.xlsx",
    #     output_file="questions_with_responses.xlsx",
    #     chatbot_url="http://localhost:8501",
    #     headless=False  # Set to True to run without showing browser
    # )